Reduce initial pipeline load time by 4-5x (1/3) #149

Rypo · 2024-11-28T02:09:22Z

Changes

Adds an option from_pretrained(low_cpu_mem_usage=True) (akin to the transformers implementation, but greatly simplified) to OmniGen and OmniGenPipeline
Use accelerate init_empty_weights context manager when initializing the model. This avoids slow CPU weight initialization, particularly during self.initialize_weights().

These weights are immediately overwritten when the state_dict is loaded. This means we can safely bypass initialization without consequence.

Additionally, this can achieved with no additional libraries beyond those in requirements.txt. As such, I set the default as low_cpu_mem_usage=True.

Results

From my tests, this change:

Reduces the initial pipeline load time by 4-5x and
Decreases peak initial RAM usage by 10-15GB

Cold Load

New process + memory freed

low_cpu_mem_usage	avg load time	RAM usage
True	9.53s	18GB
False	41.56s	28GB

Hot Load

pipe.from_pretrained...; del pipe; gc.collect(); pipe.from_pretrained...

low_cpu_mem_usage	avg load time	RAM usage
True	5.07s	18GB
False	36.64s	33GB

This is the first of 3 PRs I'm issuing to improve performance/fix errors. I've tried to keep each incremental change as small in scope as possible. PRs: 1. This, 2. #150, 3. #151

Prevents slow CPU initialization of model weights on load by using accelerate `init_empty_weights`. Completely compatible with from_pretrained since weights will always be overwritten by state_dict fixes VectorSpaceLab#72

Rypo added 2 commits November 25, 2024 19:39

feat: fast model loading with accelerate

387f48c

Prevents slow CPU initialization of model weights on load by using accelerate `init_empty_weights`. Completely compatible with from_pretrained since weights will always be overwritten by state_dict fixes VectorSpaceLab#72

fix: avoid moving model to device prematurely

0287b50

This was referenced Nov 28, 2024

Fix RuntimeError: CUDA error: out of memory on CPU transfer (2/3) #150

Open

Adds support for 4bit (nf4) and 8bit bitsandbytes quantization (3/3) #151

Open

Rypo added 2 commits December 6, 2024 12:00

Merge branch 'main' into fast_load_model

0360484

Merge branch 'main' into fast_load_model

f3d7f01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce initial pipeline load time by 4-5x (1/3) #149

Reduce initial pipeline load time by 4-5x (1/3) #149

Rypo commented Nov 28, 2024 •

edited

Loading

Reduce initial pipeline load time by 4-5x (1/3) #149

Are you sure you want to change the base?

Reduce initial pipeline load time by 4-5x (1/3) #149

Conversation

Rypo commented Nov 28, 2024 • edited Loading

Changes

Results

Cold Load

Hot Load

Rypo commented Nov 28, 2024 •

edited

Loading